Behavioral considerations suggest an average reward TD model of the dopamine system

نویسندگان

Nathaniel D. Daw

David S. Touretzky

چکیده

Recently there has been much interest in modeling the activity of primate midbrain dopamine neurons as signalling reward prediction error. But since the models are based on temporaldi!erence (TD) learning, they assume an exponential decline with time in the value of delayed reinforcers, an assumption long known to con#ict with animal behavior. We show that a variant of TD learning that tracks variations in the average reward per timestep rather than cumulative discounted reward preserves the models' success at explaining neurophysiological data while signi"cantly increasing their applicability to behavioral data. ( 2000 Published by Elsevier Science B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network.

Behavioral conditioning of cue-reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit recording and temporal difference (TD) modeling approach to this question. The data from recordings i...

متن کامل

PVLV: the primary value and learned value Pavlovian learning algorithm.

The authors present their primary value learned value (PVLV) model for understanding the reward-predictive firing properties of dopamine (DA) neurons as an alternative to the temporal-differences (TD) algorithm. PVLV is more directly related to underlying biology and is also more robust to variability in the environment. The primary value (PV) system controls performance and learning during pri...

متن کامل

Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus represen...

متن کامل

TD models of reward predictive responses in dopamine neurons

This article focuses on recent modeling studies of dopamine neuron activity and their influence on behavior. Activity of midbrain dopamine neurons is phasically increased by stimuli that increase the animal's reward expectation and is decreased below baseline levels when the reward fails to occur. These characteristics resemble the reward prediction error signal of the temporal difference (TD) ...

متن کامل

Dopamine in Reward Learning and Neuropsychiatric Disorders Context and Salience: the Role of Dopamine in Reward Learning and Neuropsychiatric Disorders

iii Abstract Evidence suggests that a change in the firing rate of dopamine (DA) cells is a major neurobiological correlate of learning. The Temporal Difference (TD) learning algorithm provides a popular account of the DA signal as conveying the error between expected and actual rewards. Other accounts have attempted to code the DA firing pattern as conveying surprise or salience. The DA mediat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Neurocomputing

دوره 32-33 شماره

صفحات -

تاریخ انتشار 2000

Behavioral considerations suggest an average reward TD model of the dopamine system

نویسندگان

چکیده

منابع مشابه

Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network.

PVLV: the primary value and learned value Pavlovian learning algorithm.

Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System

TD models of reward predictive responses in dopamine neurons

Dopamine in Reward Learning and Neuropsychiatric Disorders Context and Salience: the Role of Dopamine in Reward Learning and Neuropsychiatric Disorders

عنوان ژورنال:

اشتراک گذاری